278 research outputs found
Individual Privacy vs Population Privacy: Learning to Attack Anonymization
Over the last decade there have been great strides made in developing
techniques to compute functions privately. In particular, Differential Privacy
gives strong promises about conclusions that can be drawn about an individual.
In contrast, various syntactic methods for providing privacy (criteria such as
kanonymity and l-diversity) have been criticized for still allowing private
information of an individual to be inferred. In this report, we consider the
ability of an attacker to use data meeting privacy definitions to build an
accurate classifier. We demonstrate that even under Differential Privacy, such
classifiers can be used to accurately infer "private" attributes in realistic
data. We compare this to similar approaches for inferencebased attacks on other
forms of anonymized data. We place these attacks on the same scale, and observe
that the accuracy of inference of private attributes for Differentially Private
data and l-diverse data can be quite similar
Tight Lower Bound for Comparison-Based Quantile Summaries
Quantiles, such as the median or percentiles, provide concise and useful
information about the distribution of a collection of items, drawn from a
totally ordered universe. We study data structures, called quantile summaries,
which keep track of all quantiles, up to an error of at most .
That is, an -approximate quantile summary first processes a stream
of items and then, given any quantile query , returns an item
from the stream, which is a -quantile for some . We focus on comparison-based quantile summaries that can only
compare two items and are otherwise completely oblivious of the universe.
The best such deterministic quantile summary to date, due to Greenwald and
Khanna (SIGMOD '01), stores at most items, where is the number of items in the stream. We prove
that this space bound is optimal by showing a matching lower bound. Our result
thus rules out the possibility of constructing a deterministic comparison-based
quantile summary in space , for any function
that does not depend on . As a corollary, we improve the lower bound for
biased quantiles, which provide a stronger, relative-error guarantee of , and for other related computational tasks.Comment: 20 pages, 2 figures, major revison of the construction (Sec. 3) and
some other parts of the pape
Engineering Streaming Algorithms
Streaming algorithms must process a large quantity of small updates quickly to allow queries about the input to be answered from a small summary. Initial work on streaming algorithms laid out theoretical results, and subsequent efforts have involved engineering these for practical use. Informed by experiments, streaming algorithms have been widely implemented and used in practice. This talk will survey this line of work, and identify some lessons learned
First Author Advantage: Citation Labeling in Research
Citations among research papers, and the networks they form, are the primary
object of study in scientometrics. The act of making a citation reflects the
citer's knowledge of the related literature, and of the work being cited. We
aim to gain insight into this process by studying citation keys: user-chosen
labels to identify a cited work. Our main observation is that the first listed
author is disproportionately represented in such labels, implying a strong
mental bias towards the first author.Comment: Computational Scientometrics: Theory and Applications at The 22nd
CIKM 201
Scienceography: the study of how science is written
Scientific literature has itself been the subject of much scientific study,
for a variety of reasons: understanding how results are communicated, how ideas
spread, and assessing the influence of areas or individuals. However, most
prior work has focused on extracting and analyzing citation and stylistic
patterns. In this work, we introduce the notion of 'scienceography', which
focuses on the writing of science. We provide a first large scale study using
data derived from the arXiv e-print repository. Crucially, our data includes
the "source code" of scientific papers-the LaTEX source-which enables us to
study features not present in the "final product", such as the tools used and
private comments between authors. Our study identifies broad patterns and
trends in two example areas-computer science and mathematics-as well as
highlighting key differences in the way that science is written in these
fields. Finally, we outline future directions to extend the new topic of
scienceography.Comment: 13 pages,16 figures. Sixth International Conference on FUN WITH
ALGORITHMS, 201
- …